p

y

g

p

me. When this happens, gene expressions across replicates will

heterogeneous distribution and only a small proportion of

ns show an inconsistent trend. For instance, most replicates show

ressions, but a few show very low expressions. A heterogeneous

n distribution often happens in disease-related experiments such

ials, drug resistance and cancer diagnosis research. Evidence can

in the literature regarding the heterogenous pattern of gene

n profile [Miyachi, et al., 1993; Wani, et al., 1993; Hess, et al.,

zat, et al., 1995; Suzuki, et al., 1998; Knaust, et al., 2000;

ma, et al., 2000; Ebina, et al., 2001; Makhijani, et al., 2018;

ya, et al., 2020]. A differentially expressed gene with some outlier

n(s) present is called a heterogeneous differentially expressed

d the expressions of such a gene are called the heterogenous

ns.

consequence of heterogeneous expressions is the potential

of the Type I error rate or the Type II error rate. Because of the

neous expressions, the discovery of DEGs is challenged when

common methods such as the t test or the modified t test. In

the conventional outlier test approaches [Dixon, 1950; Grubbs,

xon, 1951] may not be efficient. An efficient way is to embed an

etection component into the DEG discovery process for a robust

covery.

ample of heterogeneous gene expression

9 is a data set used for breast cancer diagnosis and it is composed

rmal tumour samples and 14 cancer samples [Tripathi, et al.,

is interesting to know how heterogenous gene expression

s the DEG discovery for this data set. First, a matrix of the normal

and a matrix of the cancer replicates of the data were extracted

whole data spreadsheet. For each gene, a p value was obtained

t test between the normal replicates and the cancer replicates.

alue was called a raw p value. Afterwards, whether the cancer